AITopics | sigmoid function

These limitations make LSTM models difficult to deploy on specialized hardware requiring real-time processes with inferior hardware resources and power budget.

artificial intelligence, computation, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

146f7dd4c91bc9d80cf4458ad6d6cd1b-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 04:32:53 GMT

artificial intelligence, machine learning, prediction, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

the liberty to group and reword some of the reviewers comment (in blue italic) to save space. 3 General answer on the usefulness of gradient descent, its theoretical guarantees, and its scalability

Neural Information Processing SystemsAug-20-2025, 00:13:37 GMT

We thank the reviewers for the time they spent evaluating our manuscript and for their valuable comments. We agree that having theoretical guarantees would be a big plus. As for scalability, the bottleneck of our method is the single-linkage algorithm. Similarly to Monath et al. (NeurIPS 2017), our idea consists Given the significant body of additional material, we feel that this topic is best left to a future publication. Line 8,56,70,93: I would suggest a more cautious usage of the word "equivalent".

dasgupta, gradient descent, theoretical guarantee, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.33)

Add feedback

ec24a54d62ce57ba93a531b460fa8d18-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 03:43:28 GMT

artificial intelligence, beam search, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Zhang, Sikui, Gao, Guangze, Gan, Ziyun, Yuan, Chunfeng, Lin, Zefeng, Peng, Houwen, Li, Bing, Hu, Weiming

arXiv.org Artificial IntelligenceAug-6-2025

Large language models (LLMs) experience significant performance degradation when the input exceeds the pretrain-ing context window, primarily due to the out-of-distribution (OOD) behavior of Rotary Position Embedding (RoPE). Recent studies mitigate this problem by remapping OOD positions into the in-distribution range with fixed mapping strategies, ignoring the dynamic relationship between input length and the model's effective context window. To this end, we propose Length-aware M ulti-grained P ositional Encoding (LaMPE), a training-free method that fully utilizes the model's effective context window for adaptive long-context scaling in LLMs. Motivated by the left-skewed frequency distribution of relative positions, LaMPE establishes a dynamic relationship between mapping length and input length through a parametric scaled sigmoid function to adaptively allocate positional capacity across varying input lengths. Meanwhile, LaMPE devises a novel multi-grained attention mechanism that strategically allocates positional resolution across different sequence regions to capture both fine-grained locality and long-range dependencies. Our method can be seamlessly applied to a wide range of RoPE-based LLMs without training. Extensive experiments on three representative LLMs across five mainstream long-context benchmarks demonstrate that LaMPE achieves significant performance improvements compared to existing length extrapolation methods. The code will be released at https://github.com/scar-on/LaMPE.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.02308

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Unraveling the Black-box Magic: An Analysis of Neural Networks' Dynamic Local Extrema

Chen, Shengjian

arXiv.org Machine LearningJul-8-2025

We point out that neural networks are not black boxes, and their generalization stems from the ability to dynamically map a dataset to the local extrema of the model function. We further prove that the number of local extrema in a neural network is positively correlated with the number of its parameters, and on this basis, we give a new algorithm that is different from the back-propagation algorithm, which we call the extremum-increment algorithm. Some difficult situations, such as gradient vanishing and overfitting, can be reasonably explained and dealt with in this framework.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Machine Learning

2507.03885

Country: Asia > China (0.04)

Genre: Research Report (0.64)

Industry: Transportation > Air (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Filters

Collaborating Authors

sigmoid function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

477bdb55b231264bb53a7942fd84254d-Paper.pdf

b865367fc4c0845c0682bd466e6ebf4c-AuthorFeedback.pdf

6562c5c1f33db6e05a082a88cddab5ea-Paper.pdf

9d63484abb477c97640154d40595a3bb-AuthorFeedback.pdf

The Synthesis of XNOR Recurrent Neural Networks with Stochastic Logic

146f7dd4c91bc9d80cf4458ad6d6cd1b-AuthorFeedback.pdf

the liberty to group and reword some of the reviewers comment (in blue italic) to save space. 3 General answer on the usefulness of gradient descent, its theoretical guarantees, and its scalability

ec24a54d62ce57ba93a531b460fa8d18-AuthorFeedback.pdf

LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Unraveling the Black-box Magic: An Analysis of Neural Networks' Dynamic Local Extrema